Efficient XML Keyword Search: From Graph Model to Tree Model

نویسندگان

  • Yong Zeng
  • Zhifeng Bao
  • Tok Wang Ling
  • Guoliang Li
چکیده

Keyword search, as opposed to traditional structured query, has been becoming more and more popular on querying XML data in recent years. XML documents usually contain some ID nodes and IDREF nodes to represent reference relationships among the data. An XML document with ID/IDREF is modeled as a graph by existing works, where the keyword query results are computed by graph traversal. As a comparison, if ID/IDREF is not considered, an XML document can be modeled as a tree. Keyword search on XML tree can be much more efficient using tree-based labeling techniques. A nature question is whether we need to abandon the efficient XML tree search methods and invent new, but less efficient search methods for XML graph. To address this problem, we propose a novel method to transform an XML graph to a tree model such that we can exploit existing XML tree search methods. The experimental results show that our solution can outperform the traditional XML graph search methods by orders of magnitude in efficiency while generating a similar set of results as existing XML graph search methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ICRA: Effective Semantics for Ranked XML Keyword Search

Keyword search is a user-friendly way to query XML databases. Most previous efforts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model for XML is generally simple and efficient for keyword proximity search. However, it cannot capture connections such as ID references in XML databases. In the contrast, technique...

متن کامل

Keyword Search in Bibliographic XML Data

Keyword search is a user-friendly way to query text, HTML, XML documents and even relational databases. The previous well-known semantic of LCA (Lowest Common Ancestor) is used for XML keyword search based on tree model. However, LCA cannot exploit the information in ID references, thus may return a large tree containing irrelevant results. Another keyword search approach based on general digra...

متن کامل

Kent Ridge Road , Singapore 119260 TR C 5 / 0 7 ICRA : Effective Semantics for Ranked XML Keyword Search

Keyword search is a user-friendly way to query XML databases. Most previous efforts in this area focus on keyword proximity search in XML based on either tree data model or graph (or digraph) data model. Tree data model for XML is generally simple and efficient for keyword proximity search. However, it cannot capture connections such as ID references in XML databases. In the contrast, technique...

متن کامل

A System for Keyword Proximity Search on XML Databases

Keyword proximity search is a user-friendly information discovery technique that has been extensively studied for text documents. In extending this technique to structured databases, recent works [6, 7, 4, 2] provide keyword proximity search on labeled graphs. A keyword proximity search does not require the user to know the structure of the graph, the role of the objects containing the keywords...

متن کامل

Exploiting ID References for Effective Keyword Search in XML Documents

In this paper, we study novel Tree + IDREF data model for keyword search in XML. In this model, we propose novel Lowest Referred Ancestor (LRA) pair, Extended LRA (ELRA) pair and ELRA group semantics for effective and efficient keyword search. We develop efficient algorithms to compute the search results based on our semantics. Experimental study shows the superiority of our approach.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013